Add runner to wait until snapshot has been created #1047

dliappis · 2020-08-13T18:14:03Z

For snapshot creation (and metrics) so far we've relied only the
create-snapshot runner with a blocking call.

In this commit we are introducing a new runner
wait-for-snapshot-createto complement create-snapshot. This is
similar to restore-snapshot and wait-for-recovery and helps in cases
where network connections maybe terminated making a blocking call
unsuitable.

For snapshot creation (and metrics) so far we've relied only the `create-snapshot` runner with a blocking call. In this commit we are introducing a new runner `wait-for-snapshot-create`to complement `create-snapshot`. This is similar to `restore-snapshot` and `wait-for-recovery` and helps in cases where network connections maybe terminated making a blocking call unsuitable.

danielmitterdorfer

Looks good! I left a few comments.

danielmitterdorfer · 2020-08-14T05:36:49Z

docs/track.rst

@@ -1119,7 +1119,24 @@ With the operation ``create-snapshot`` you can `create a snapshot <https://www.e
 * ``request-params`` (optional): A structure containing HTTP request parameters.

 .. note::
-    When ``wait-for-completion`` is set to ``true`` Rally will report the achieved throughput in byte/s.
+    It's not recommend to rely on ``wait-for-completion=true``. Instead you should keep the default value (``False``) and use an additional ``wait-for-snapshot-create`` operation in the next step.


nit: "not recommend" -> "not recommended"

Addressed in 32ee4ab

danielmitterdorfer · 2020-08-14T05:37:20Z

docs/track.rst

@@ -1119,7 +1119,24 @@ With the operation ``create-snapshot`` you can `create a snapshot <https://www.e
 * ``request-params`` (optional): A structure containing HTTP request parameters.

 .. note::
-    When ``wait-for-completion`` is set to ``true`` Rally will report the achieved throughput in byte/s.
+    It's not recommend to rely on ``wait-for-completion=true``. Instead you should keep the default value (``False``) and use an additional ``wait-for-snapshot-create`` operation in the next step.
+    This is mandatory on the `Elastic Cloud <https://www.elastic.co/cloud>`_ or environments where Elasticsearch is sitting behind a network element that may terminate the blocking connection after a timeout.


on the Elastic Cloud -> on Elastic Cloud?

"sitting behind a network element" -> "connected via intermediate network components, such as proxies, that may terminate ..."?

Addressed in 32ee4ab

danielmitterdorfer · 2020-08-14T05:41:38Z

esrally/driver/runner.py

+                # Possible states:
+                # https://www.elastic.co/guide/en/elasticsearch/reference/current/get-snapshot-status-api.html#get-snapshot-status-api-response-body
+                if response_state == "FAILED":
+                    self.logger.error("Snapshot [%s] failed. Response status:\n%s", snapshot, json.dumps(response))


"Response status" -> "Response"? (as it is the actual full, response). I also think we should pretty-print it by specifying e.g. indent=2 in json.dumps.

Addressed in 32ee4ab

danielmitterdorfer · 2020-08-14T05:42:33Z

esrally/driver/runner.py

+                # https://www.elastic.co/guide/en/elasticsearch/reference/current/get-snapshot-status-api.html#get-snapshot-status-api-response-body
+                if response_state == "FAILED":
+                    self.logger.error("Snapshot [%s] failed. Response status:\n%s", snapshot, json.dumps(response))
+                    raise exceptions.RallyAssertionError(


Nit: Any reason why it is formatted this way? I tried putting everything on the same line and ended up with a line width of 110 which should still be fine?

Addressed in 32ee4ab

danielmitterdorfer · 2020-08-14T05:45:42Z

esrally/track/track.py

@@ -420,7 +420,7 @@ class OperationType(Enum):
    RawRequest = 5
    WaitForRecovery = 6
    CreateSnapshot = 7


I think we could move CreateSnapshot again to administrative actions (i.e. anything > 1000) because Rally would not report request metrics by default for administrative operations (it's usually not interesting to know).

Addressed in 32ee4ab

danielmitterdorfer · 2020-08-14T05:50:10Z

tests/driver/runner_test.py

@@ -2756,6 +2757,140 @@ async def test_create_snapshot_wait_for_completion(self, es):
            }
        })

+        params = {


Unfortunately I cannot comment at the specific line but in test_create_snapshot_no_wait we still mock es.snapshot.status and assert later on that it is not called. IMHO we should remove this now because there is no chance we'd ever call it anymore in the new runner implementation.

Addressed in 32ee4ab + 2b242b2 + 6523ec9

danielmitterdorfer · 2020-08-14T05:54:35Z

tests/driver/runner_test.py

+
+        r = runner.WaitForSnapshotCreate()
+
+        logger = logging.getLogger("esrally.driver.runner")


how about we just use the runner's logger with logger = r.logger (you can then probably just inline it in the with statement below? This would also make it more refactoring safe.

Addressed in 32ee4ab

danielmitterdorfer · 2020-08-14T05:56:57Z

docs/track.rst

+* ``snapshot`` (mandatory): The name of the snapshot that this operation will wait until it succeeds.
+* ``completion-recheck-wait-period`` (optional, defaults to 1 second): Time in seconds to wait in between consecutive attempts.
+
+Rally will report the achieved throughput in byte/s, the duration in seconds, the start and stop time in milliseconds and the total amount of files snapshotted as returned by the the `Elasticsearch snapshot status API call <https://www.elastic.co/guide/en/elasticsearch/reference/current/get-snapshot-status-api.html>`_.


I am not sure I'd document all the metadata that we return (except for the throughput)? Otherwise we should probably also document the respective metric keys and make clear that these metadata are only available with an Elasticsearch metrics store (contrary to the throughput).

Addressed in 32ee4ab

dliappis · 2020-08-14T08:00:50Z

Thanks for your comments! Could you PTAL?

danielmitterdorfer

Thanks for iterating. LGTM!

dliappis added enhancement Improves the status quo :Track Management New operations, changes in the track format, track download changes and the like labels Aug 13, 2020

dliappis added this to the 2.0.2 milestone Aug 13, 2020

dliappis requested review from danielmitterdorfer and ebadyano August 13, 2020 18:14

dliappis self-assigned this Aug 13, 2020

dliappis force-pushed the add-wait-for-snapshot-create branch from 065beaf to 293922e Compare August 13, 2020 18:17

danielmitterdorfer reviewed Aug 14, 2020

View reviewed changes

dliappis added 3 commits August 14, 2020 10:54

Address PR comments

32ee4ab

and fix test after addressing PR comments

2b242b2

Don't mock status

6523ec9

dliappis requested a review from danielmitterdorfer August 14, 2020 08:00

danielmitterdorfer approved these changes Aug 14, 2020

View reviewed changes

dliappis merged commit be8622a into elastic:master Aug 14, 2020

dliappis deleted the add-wait-for-snapshot-create branch August 14, 2020 13:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add runner to wait until snapshot has been created #1047

Add runner to wait until snapshot has been created #1047

dliappis commented Aug 13, 2020

danielmitterdorfer left a comment

danielmitterdorfer Aug 14, 2020

dliappis Aug 14, 2020

danielmitterdorfer Aug 14, 2020

dliappis Aug 14, 2020

danielmitterdorfer Aug 14, 2020

dliappis Aug 14, 2020

danielmitterdorfer Aug 14, 2020

dliappis Aug 14, 2020

danielmitterdorfer Aug 14, 2020

dliappis Aug 14, 2020

danielmitterdorfer Aug 14, 2020

dliappis Aug 14, 2020

danielmitterdorfer Aug 14, 2020

dliappis Aug 14, 2020

danielmitterdorfer Aug 14, 2020

dliappis Aug 14, 2020

dliappis commented Aug 14, 2020

danielmitterdorfer left a comment


		r = runner.WaitForSnapshotCreate()

		logger = logging.getLogger("esrally.driver.runner")

Add runner to wait until snapshot has been created #1047

Add runner to wait until snapshot has been created #1047

Conversation

dliappis commented Aug 13, 2020

danielmitterdorfer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dliappis commented Aug 14, 2020

danielmitterdorfer left a comment

Choose a reason for hiding this comment